An analysis of model-based Interval Estimation for Markov Decision Processes
نویسندگان
چکیده
Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper presents a theoretical analysis of MBIE and a new variation called MBIE-EB, proving their efficiency even under worst-case conditions. The paper also introduces a new performance metric, average loss, and relates it to its less “online” cousins from the literature.
منابع مشابه
Availability analysis of mechanical systems with condition-based maintenance using semi-Markov and evaluation of optimal condition monitoring interval
Maintenance helps to extend equipment life by improving its condition and avoiding catastrophic failures. Appropriate model or mechanism is, thus, needed to quantify system availability vis-a-vis a given maintenance strategy, which will assist in decision-making for optimal utilization of maintenance resources. This paper deals with semi-Markov process (SMP) modeling for steady state availabili...
متن کاملA Theoretical Analysis of Model-Based Interval Estimation: Proofs
Several algorithms for learning near-optimal policies in Markov Decision Processes have been analyzed and proven efficient. Empirical results have suggested that Model-based Interval Estimation (MBIE) learns efficiently in practice, effectively balancing exploration and exploitation. This paper presents the first theoretical analysis of MBIE, proving its efficiency even under worst-case conditi...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملA Multi Objective Fibonacci Search Based Algorithm for Resource Allocation in PERT Networks
The problem we investigate deals with the optimal assignment of resources to the activities of a stochastic project network. We seek to minimize the expected cost of the project include sum of resource utilization costs and lateness costs. We assume that the work content required by the activities follows an exponential distribution. The decision variables of the model are the allocated resourc...
متن کاملExtended Geometric Processes: Semiparametric Estimation and Application to ReliabilityImperfect repair, Markov renewal equation, replacement policy
Lam (2007) introduces a generalization of renewal processes named Geometric processes, where inter-arrival times are independent and identically distributed up to a multiplicative scale parameter, in a geometric fashion. We here envision a more general scaling, not necessar- ily geometric. The corresponding counting process is named Extended Geometric Process (EGP). Semiparametric estimates are...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Comput. Syst. Sci.
دوره 74 شماره
صفحات -
تاریخ انتشار 2008